independent replication
Nested Atoms Model with Application to Clustering Big Population-Scale Single-Cell Data
Chakrabarti, Arhit, Ni, Yang, Jiang, Yuchao, Mallick, Bani K.
We consider the problem of clustering nested or hierarchical data, where observations are grouped and there are both group-level and observation-level variables. In our motivating OneK1K dataset, observations consist of single-cell RNA-sequencing (scRNA-seq) data from 982 individuals (groups), totaling 1.27 million cells (observations), along with individual-specific genotype data. This type of data would enable the identification of cell types and the investigation of how genetic variations among individuals influence differences in cell-type profiles. Our goal, therefore, is to jointly cluster cells and individuals to capture the heterogeneity across both levels using cell-specific gene expressions as well as individual-specific genotypes. However, existing grouped clustering methods do not incorporate group-level variables, thereby limiting their ability to capture the heterogeneity of genotypes in our motivating application. To address this, we propose the Nested Atoms Model (NAM), a new Bayesian nonparametric approach that enables the desired two-layered clustering, accounting for both group-level and observation-level variables. To scale NAM for high-dimensional data, we develop a fast variational Bayesian inference algorithm. Simulations show that NAM outperforms existing methods that ignore group-level variables. Applied to the OneK1K dataset, NAM identifies clusters of genetically similar individuals with homogeneous cell-type profiles. The resulting cell clusters align with known immune cell types based on differential gene expression, underscoring the ability of NAM to capture nested heterogeneity and provide biologically meaningful insights.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Statistical Model Checking of NetLogo Models
Pangallo, Marco, Giachini, Daniele, Vandin, Andrea
Agent-based models (ABMs) are gaining increasing traction in several domains, due to their ability to represent complex systems that are not easily expressible with classical mathematical models. This expressivity and richness come at a cost: ABMs can typically be analyzed only through simulation, making their analysis challenging. Specifically, when studying the output of ABMs, the analyst is often confronted with practical questions such as: (i) how many independent replications should be run? (ii) how many initial time steps should be discarded as a warm-up? (iii) after the warm-up, how long should the model run? (iv) what are the right parameter values? Analysts usually resort to rules of thumb and experimentation, which lack statistical rigor. This is mainly because addressing these points takes time, and analysts prefer to spend their limited time improving the model. In this paper, we propose a methodology, drawing on the field of Statistical Model Checking, to automate the process and provide guarantees of statistical rigor for ABMs written in NetLogo, one of the most popular ABM platforms. We discuss MultiVeStA, a tool that dramatically reduces the time and human intervention needed to run statistically rigorous checks on ABM outputs, and introduce its integration with NetLogo. Using two ABMs from the NetLogo library, we showcase MultiVeStA's analysis capabilities for NetLogo ABMs, as well as a novel application to statistically rigorous calibration. Our tool-chain makes it immediate to perform statistical checks with NetLogo models, promoting more rigorous and reliable analyses of ABM outputs.
Concentration Bounds for Optimized Certainty Equivalent Risk Estimation
Ghosh, Ayon, Prashanth, L. A., Jagannathan, Krishna
We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as concentration bounds (assuming sub-Gaussianity). Further, we analyze an efficient stochastic approximation-based OCE estimator, and derive finite sample bounds for the same. To show the applicability of our bounds, we consider a risk-aware bandit problem, with OCE as the risk. For this problem, we derive bound on the probability of mis-identification. Finally, we conduct numerical experiments to validate the theoretical findings.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (0.68)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
A network-constrain Weibull AFT model for biomarkers discovery
Angelini, Claudia, De Canditiis, Daniela, De Feis, Italia, Iuliano, Antonella
We propose AFTNet, a novel network-constraint survival analysis method based on the Weibull accelerated failure time (AFT) model solved by a penalized likelihood approach for variable selection and estimation. When using the log-linear representation, the inference problem becomes a structured sparse regression problem for which we explicitly incorporate the correlation patterns among predictors using a double penalty that promotes both sparsity and grouping effect. Moreover, we establish the theoretical consistency for the AFTNet estimator and present an efficient iterative computational algorithm based on the proximal gradient descent method. Finally, we evaluate AFTNet performance both on synthetic and real data examples.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)